1 Supporting linguistic annotation using XML
نویسندگان
چکیده
Large-scale linguistic annotation is currently employed for a wide range of purposes, including comparing communication under different conditions, testing psycholinguistic hypotheses, and training natural language engines. Current software support for linguistic annotation is poor, with much of it written for one-off tasks using special purpose data representations and data handling routines. This impedes research because software cannot be reused and the resulting annotations can be difficult to use in analyses or applications for which they were not originally intended. This paper argues for a particular vision of how support for linguistic annotation could be provided using XML as the data format and stylesheets as the processing mechanism.
منابع مشابه
Annotating text using the Linguistic Description Scheme of MPEG-7: The DIRECT-INFO Scenario
We describe the way we adapted a text analysis tool for annotating with the Linguistic Description Scheme of MPEG-7 text related to and extracted from multimedia content. Practically applied in the DIRECT-INFO EC R&D project we show how such linguistic annotation contributes to semantic annotation of multimodal analysis systems, demonstrating also the use of the XML schema of MPEG-7 for support...
متن کاملThe HOLJ Corpus: Supporting Summarisation Of Legal Texts
We describe an XML-encoded corpus of texts in the legal domain which was gathered for an automatic summarisation project. We describe two distinct layers of annotation: manual annotation of the rhetorical status of sentences and an entirely automatic annotation process incorporating a host of individual linguistic processors. The manual rhetorical status annotation has been developed as trainin...
متن کاملMultidimensional markup and heterogeneous linguistic resources
The paper discusses two topics: firstly an approach of using multiple layers of annotation is sketched out. Regarding the XML representation this approach is similar to standoff annotation. A second topic is the use of heterogeneous linguistic resources (e.g., XML annotated documents, taggers, lexical nets) as a source for semiautomatic multi-dimensional markup to resolve typical linguistic iss...
متن کاملRepresenting and Accessing Multilevel Linguistic Annotation using the MEANING Format
We present an XML annotation format (MEANING Annotation Format, MAF) specifically designed to represent and integrate different levels of linguistic annotations and a tool that provides flexible access to them (MEANING Browser). We describe our experience in integrating linguistic annotations coming from different sources, and the solutions we adopted to implement efficient access to corpora an...
متن کاملQuerying Xml Document Collections
In this paper we describe a query interface towards XML document collections. External schema annotation in RDF contains information used to dynamically build the interface tailored to the user’s characteristics and to the document structure, as described by its XML Schema. The interface makes the user aware of structure semantics, so supporting her/him in formulating semantically correct queri...
متن کامل